Methods For Rapid Forensic Analysis Of Mitochondrial DNA

ABSTRACT

The present invention provides methods for rapid forensic analysis of mitochondrial DNA by amplification of a segment of mitochondrial DNA containing restriction sites, digesting the mitochondrial DNA segments with restriction enzymes, determining the molecular masses of the restriction fragments and comparing the molecular masses with the molecular masses of theoretical restriction digests of known mitochondrial DNA sequences stored in a database.

RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No. 12/049,949 filed Mar. 17, 2008, which is a continuation of U.S. patent application Ser. No. 10/853,660 filed May 25, 2004, the disclosure of each of which is incorporated by reference in its entirety herein.

STATEMENT OF GOVERNMENT SUPPORT

This invention was made with United States Government support under DARPA contract MDA-972-03C-112. The United States Government has certain rights in the invention.

FIELD OF THE INVENTION

This invention relates to the field of mitochondrial DNA analysis. The invention enables rapid and accurate forensic analysis by using mass spectrometry to characterize informative regions of mitochondrial DNA.

BACKGROUND OF THE INVENTION

Mitochondrial DNA (mtDNA) is found in eukaryotes and differs from nuclear DNA in its location, its sequence, its quantity in the cell, and its mode of inheritance. The nucleus of the cell contains two sets of 23 chromosomes, one paternal set and one maternal set. However, cells may contain hundreds to thousands of mitochondria, each of which may contain several copies of mtDNA. Nuclear DNA has many more bases than mtDNA, but mtDNA is present in many more copies than nuclear DNA. This characteristic of mtDNA is useful in situations where the amount of DNA in a sample is very limited. Typical sources of DNA recovered from crime scenes include hair, bones, teeth, and body fluids such as saliva, semen, and blood.

In humans, mitochondrial DNA is inherited strictly from the mother (Case J. T. and Wallace, D. C., Somatic Cell Genetics, 1981, 7, 103-108; Giles, R. E. et al. Proc. Natl. Acad. Sci. 1980, 77, 6715-6719; Hutchison, C. A. et al. Nature, 1974, 251, 536-538). Thus, the mtDNA sequences obtained from maternally related individuals, such as a brother and a sister or a mother and a daughter, will exactly match each other in the absence of a mutation. This characteristic of mtDNA is advantageous in missing persons cases as reference mtDNA samples can be supplied by any maternal relative of the missing individual (Ginther, C. et al. Nature Genetics, 1992, 2, 135-138; Holland, M. M. et al. Journal of Forensic Sciences, 1993, 38, 542-553; Stoneking, M. et al. American Journal of Human Genetics, 1991, 48, 370-382).

The human mtDNA genome is approximately 16,569 bases in length and has two general regions: the coding region and the control region. The coding region is responsible for the production of various biological molecules involved in the process of energy production in the cell and includes about 37 genes (22 transfer RNAs, 2 ribosomal RNAs, and 13 peptides), with very little intergenic sequence and no introns. The control region is responsible for regulation of the mtDNA molecule. Two regions of mtDNA within the control region have been found to be highly polymorphic, or variable, within the human population (Greenberg, B. D. et al. Gene, 1983, 21, 33-49). These two regions are termed “hypervariable Region I” (HV1), which has an approximate length of 342 base pairs (bp), and “hypervariable Region II” (HV2), which has an approximate length of 268 bp. Forensic mtDNA examinations are performed using these two regions because of the high degree of variability found among individuals.

There exists a need for rapid identification of humans wherein human remains and/or biological samples are analyzed. Such remains or samples may be associated with war-related casualties, aircraft crashes, and acts of terrorism, for example. Analysis of mtDNA enables a rule-in/rule-out identification process for persons for whom DNA profiles from a maternal relative are available. Human identification by analysis of mtDNA can also be applied to human remains and/or biological samples obtained from crime scenes.

The process of human identification is a common objective of forensics investigations. As used herein, “forensics” is the study of evidence discovered at a crime or accident scene and used in a court of law. “Forensic science” is any science used for the purposes of the law, in particular the criminal justice system, and therefore provides impartial scientific evidence for use in the courts of law, and in a criminal investigation and trial. Forensic science is a multidisciplinary subject, drawing principally from chemistry and biology, but also from physics, geology, psychology and social science, for example.

Forensic scientists generally use two highly variable regions of human mtDNA for analysis. These regions are designated “hypervariable regions 1 and 2” (HV1 and HV2—which contain 341 and 267 base pairs respectively). These hypervariable regions, or portions thereof, provide one non-limiting example of mitochondrial DNA identifying amplicons.

A typical mtDNA analysis begins when total genomic DNA is extracted from biological material, such as a tooth, blood sample, or hair. The polymerase chain reaction (PCR) is then used to amplify, or create many copies of, the two hypervariable portions of the non-coding region of the mtDNA molecule, using flanking primers. Care is taken to eliminate the introduction of exogenous DNA during both the extraction and amplification steps via methods such as the use of pre-packaged sterile equipment and reagents, aerosol-resistant barrier pipette tips, gloves, masks, and lab coats, separation of pre- and post-amplification areas in the lab using dedicated reagents for each, ultraviolet irradiation of equipment, and autoclaving of tubes and reagent stocks. In casework, questioned samples are always processed before known samples and they are processed in different laboratory rooms. When adequate amounts of PCR product are amplified to provide all the necessary information about the two hypervariable regions, sequencing reactions are performed. These chemical reactions use each PCR product as a template to create a new complementary strand of DNA in which some of the nucleotide residues that make up the DNA sequence are labeled with dye. The strands created in this stage are then separated according to size by an automated sequencing machine that uses a laser to “read” the sequence, or order, of the nucleotide bases. Where possible, the sequences of both hypervariable regions are determined on both strands of the double-stranded DNA molecule, with sufficient redundancy to confirm the nucleotide substitutions that characterize that particular sample. At least two forensic analysts independently assemble the sequence and then compare it to a standard, commonly used, reference sequence. The entire process is then repeated with a known sample, such as blood or saliva collected from a known individual. The sequences from both samples, about 780 bases long each, are compared to determine if they match. The analysts assess the results of the analysis and determine if any portions of it need to be repeated. Finally, in the event of an inclusion or match, the SWGDAM mtDNA database, which is maintained by the FBI, is searched for the mitochondrial sequence that has been observed for the samples. The analysts can then report the number of observations of this type based on the nucleotide positions that have been read. A written report can be provided to the submitting agency.

Approximately 610 bp of mtDNA are currently sequenced in forensic mtDNA analysis. Recording and comparing mtDNA sequences would be difficult and potentially confusing if all of the bases were listed. Thus, mtDNA sequence information is recorded by listing only the differences with respect to a reference DNA sequence. By convention, human mtDNA sequences are described using the first complete published mtDNA sequence as a reference (Anderson, S. et al., Nature, 1981, 290, 457-465). This sequence is commonly referred to as the Anderson sequence. It is also called the Cambridge reference sequence or the Oxford sequence. Each base pair in this sequence is assigned a number. Deviations from this reference sequence are recorded as the number of the position demonstrating a difference and a letter designation of the different base. For example, a transition from A to G at Position 263 would be recorded as 263 G. If deletions or insertions of bases are present in the mtDNA, these differences are denoted as well.

In the United States, there are seven laboratories currently conducting forensic mtDNA examinations: the FBI Laboratory; Laboratory Corporation of America (LabCorp) in Research Triangle Park, North Carolina; Mitotyping Technologies in State College, Pa.; the Bode Technology Group (BTG) in Springfield, Virginia; the Armed Forces DNA Identification Laboratory (AFDIL) in Rockville, Md.; BioSynthesis, Inc. in Lewisville, Texas; and Reliagene in New Orleans, La.

Mitochondrial DNA analyses have been admitted in criminal proceedings from these laboratories in the following states as of April 1999: Alabama, Arkansas, Florida, Indiana, Illinois, Maryland, Michigan, New Mexico, North Carolina, Pennsylvania, South Carolina, Tennessee, Texas, and Washington. Mitochondrial DNA has also been admitted and used in criminal trials in Australia, the United Kingdom, and several other European countries.

Since 1996, the number of individuals performing mitochondrial DNA analysis at the FBI Laboratory has grown from 4 to 12, with more personnel expected in the near future. Over 150 mitochondrial DNA cases have been completed by the FBI Laboratory as of March 1999, and dozens more await analysis. Forensic courses are being taught by the FBI Laboratory personnel and other groups to educate forensic scientists in the procedures and interpretation of mtDNA sequencing. More and more individuals are learning about the value of mtDNA sequencing for obtaining useful information from evidentiary samples that are small, degraded, or both. Mitochondrial DNA sequencing is becoming known not only as an exclusionary tool but also as a complementary technique for use with other human identification procedures. Mitochondrial DNA analysis will continue to be a powerful tool for law enforcement officials in the years to come as other applications are developed, validated, and applied to forensic evidence.

Presently, the forensic analysis of mtDNA is rigorous and labor-intensive. Currently, only 1-2 cases per month per analyst can be performed. Several molecular biological techniques are combined to obtain a mtDNA sequence from a sample. The steps of the mtDNA analysis process include primary visual analysis, sample preparation, DNA extraction, polymerase chain reaction (PCR) amplification, post-amplification quantification of the DNA, automated DNA sequencing, and data analysis. Another complicating factor in the forensic analysis of mtDNA is the occurrence of heteroplasmy wherein the pool of mtDNAs in a given cell is heterogeneous due to mutations in individual mtDNAs. There are two forms of heteroplasmy found in mtDNA. Sequence heteroplasmy (also known as point heteroplasmy) is the occurrence of more than one base at a particular position or positions in the mtDNA sequence. Length heteroplasmy is the occurrence of more than one length of a stretch of the same base in a mtDNA sequence as a result of insertion of nucleotide residues.

Heteroplasmy is a problem for forensic investigators since a sample from a crime scene can differ from a sample from a suspect by one base pair and this difference may be interpreted as sufficient evidence to eliminate that individual as the suspect. Hair samples from a single individual can contain heteroplasmic mutations at vastly different concentrations and even the root and shaft of a single hair can differ. The detection methods currently available to molecular biologists cannot detect low levels of heteroplasmy. Furthermore, if present, length heteroplasmy will adversely affect sequencing runs by resulting in an out-of-frame sequence that cannot be interpreted.

Mass spectrometry provides detailed information about the molecules being analyzed, including high mass accuracy. It is also a process that can be easily automated.

Several groups have described detection of PCR products using high resolution electrospray ionization-Fourier transform-ion cyclotron resonance mass spectrometry (ESI-FT-ICR MS). Accurate measurement of exact mass combined with knowledge of the number of at least one nucleotide allowed calculation of the total base composition for PCR duplex products of approximately 100 base pairs. (Aaserud et al., J. Am. Soc. Mass Spec., 1996, 7, 1266-1269; Muddiman et al., Anal. Chem., 1997, 69, 1543-1549; Wunschel et al., Anal. Chem., 1998, 70, 1203-1207; Muddiman et al., Rev. Anal. Chem., 1998, 17, 1-68). Electrospray ionization-Fourier transform-ion cyclotron resistance (ESI-FT-ICR) MS may be used to determine the mass of double-stranded, 500 base-pair PCR products via the average molecular mass (Hurst et al., Rapid Commun. Mass Spec. 1996, 10, 377-382). The use of matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass spectrometry for characterization of PCR products has been described. (Muddiman et al., Rapid Commun. Mass Spec., 1999, 13, 1201-1204). However, the degradation of DNAs over about 75 nucleotides observed with MALDI limited the utility of this method.

U.S. Pat. No. 5,849,492 reports a method for retrieval of phylogenetically informative DNA sequences which comprise searching for a highly divergent segment of genomic DNA surrounded by two highly conserved segments, designing the universal primers for PCR amplification of the highly divergent region, amplifying the genomic DNA by PCR technique using universal primers, and then sequencing the gene to determine the identity of the organism.

U.S. Pat. No. 5,965,363 reports methods for screening nucleic acids for polymorphisms by analyzing amplified target nucleic acids using mass spectrometric techniques and to procedures for improving mass resolution and mass accuracy of these methods.

WO 99/14375 reports methods, PCR primers and kits for use in analyzing preselected DNA tandem nucleotide repeat alleles by mass spectrometry.

WO 98/12355 reports methods of determining the mass of a target nucleic acid by mass spectrometric analysis, by cleaving the target nucleic acid to reduce its length, making the target single-stranded and using MS to determine the mass of the single-stranded shortened target. Also reported are methods of preparing a double-stranded target nucleic acid for MS analysis comprising amplification of the target nucleic acid, binding one of the strands to a solid support, releasing the second strand and then releasing the first strand which is then analyzed by MS. Kits for target nucleic acid preparation are also reported.

PCT WO97/33000 reports methods for detecting mutations in a target nucleic acid by nonrandomly fragmenting the target into a set of single-stranded nonrandom length fragments and determining their masses by MS.

U.S. Pat. No. 5,605,798 reports a fast and highly accurate mass spectrometer-based process for detecting the presence of a particular nucleic acid in a biological sample for diagnostic purposes.

WO 98/20166 reports processes for determining the sequence of a particular target nucleic acid by mass spectrometry. Processes for detecting a target nucleic acid present in a biological sample by PCR amplification and mass spectrometry detection are disclosed, as are methods for detecting a target nucleic acid in a sample by amplifying the target with primers that contain restriction sites and tags, extending and cleaving the amplified nucleic acid, and detecting the presence of extended product, wherein the presence of a DNA fragment of a mass different from wild-type is indicative of a mutation. Methods of sequencing a nucleic acid via mass spectrometry methods are also described.

WO 97/37041, WO 99/31278 and U.S. Pat. No. 5,547,835 report methods of sequencing nucleic acids using mass spectrometry. U.S. Pat. Nos. 5,622,824, 5,872,003 and 5,691,141 report methods, systems and kits for exonuclease-mediated mass spectrometric sequencing.

There is a need for a mitochondrial DNA forensic analysis which is both specific and rapid, and in which no nucleic acid sequencing is required. The present invention addresses this need, among others.

SUMMARY OF THE INVENTION

The present invention is directed to methods of forensic analysis of mitochondrial DNA comprising: amplifying a segment of mitochondrial DNA containing a plurality of restriction sites and flanked by a pair of primers to produce an amplification product, digesting the amplification product with a plurality of restriction enzymes to produce a plurality of restriction digest products, determining the molecular mass of each member of the plurality of restriction digest products, generating a fragment coverage map from the molecular masses and comparing the fragment coverage map with a plurality of theoretical fragment coverage maps contained in a database stored on a computer readable medium.

The present invention is also directed to primer pair compositions used to amplify mitochondrial DNA for the forensic method and to isolated mitochondrial DNA amplicons obtained by amplification of mitochondrial DNA with the primer pair compositions.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of base composition determination using nucleotide analog “tags” to determine base compositions.

FIG. 2 shows the deconvoluted mass spectra of a Bacillus anthracis region with and without the mass tag phosphorothioate A (A*). The two spectra differ in that the measured molecular weight of the mass tag-containing sequence is greater than that of the unmodified sequence.

FIG. 3 indicates the process of mtDNA analysis. After amplification by PCR (210), the PCR products were subjected to restriction digests (220) with RsaI for HV1 and a combination of HpaII, HpyCH4IV, PacI and EaeI for HV2 in order to obtain amplicon segments suitable for analysis by FTICR-MS (240). The data were processed to obtain mass data for each amplicon segment (250) which were then compared to the masses calculated for theoretical digests from the FBI mtDNA database by a scoring scheme (260).

FIG. 4 is a comparison of two mass spectra which indicates that the use of exo(−) pfu polymerase prevents addition of non-templated adenosine residues and results in a strong signal, relative to the use of the commonly used Amplitaq™ gold polymerase.

FIG. 5 indicates that gel electrophoresis confirms that exo(−) pfu polymerase is equally effective as a standard polymerase in amplification of mtDNA obtained from blood, fingernail and saliva samples.

FIG. 6 exhibits two plots that indicate positions of cleavage of human mtDNA obtained with different panels of restriction endonucleases. The modified panel wherein EaeI and PacI are replaced with HaeIII and HpyCH4V respectively, results in better spacing of conserved restriction sites.

FIG. 7 is an agarose gel electrophoresis photo confirming the activity of restriction endonucleases: EaeI, HpyCH4IV, HpyCH4IV, HpaII, PacI and HaeIII on HV2 amplicon from a mtDNA preparation obtained from a blood sample (Seracare N31773).

FIG. 8 is an agarose gel electrophoresis photo confirming that the primers designed to amplify the 12 non-control regions (Regions R1-R12) produce amplicons of the expected sizes.

FIG. 9 is an agarose gel electrophoresis photo indicating the sensitivity of the HV1 and Hv2 primer pairs assessed against DNA isolated from human blood. A PCR product is detectable down to between 160 pg and 1.6 ng for both HV1 and HV2 primer pairs.

FIG. 10 is an agarose gel electrophoresis photo indicating that PCR products are obtained for each of the 36 samples described in Example 13 when amplified with HV1 primers.

DESCRIPTION OF EMBODIMENTS

The present invention provides, inter alia, methods for forensic analysis of mitochondrial DNA. A region of mitochondrial DNA which contains on or more restriction sites is selected to provide optimal distinguishing capability which enables forensic conclusions to be drawn. A relational database of known mitochondrial DNA sequences is then populated with the results of theoretical restriction digestion reactions. One or more primer pairs are then selected to amplify the region of mitochondrial DNA and amplification product is digested with one or more restriction enzymes which are chosen to yield restriction fragments of up to about 150 base pairs that are amenable to molecular mass analysis. The molecular masses of all of the restriction fragments are then measured and the results are compared with the results calculated for the theoretical restriction digestions of all of the entries in the relational database. The results of the comparison enable a forensic conclusion to be drawn.

In one embodiment, more than one region can be analyzed to draw a forensic conclusion via a triangulation strategy. For example, it is possible that analysis of one region of DNA obtained from a crime scene yields several possible matches to entries in a relational database. In this case, depending on the objective of the individual forensic analysis, it may be advantageous to carry out one or more additional analyses of different mtDNA regions. Examples of such mtDNA regions include, but are not limited to a portion of, HV1, HV2, R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11 and R12 (coordinates for each of these defined regions, relative to the Anderson Sequence are given in Table 2). Thus, in this embodiment, any combination of two or more regions of mtDNA are used to provide optimal distinguishing capability and provide an improved confidence level for the forensic analysis.

In another embodiment, the relational database of known mitochondrial DNA sequences is populated with base compositions of the theoretical restriction fragments obtained from theoretical digestion of each member of the database. Then the base compositions of each of the restriction fragments of the experimentally determined molecular masses are determined. The analysis may then end with a comparison of the experimentally determined base compositions with the base compositions of the theoretical digestions of each member of the database so that at least one base composition match or lack of a base composition match provides a forensic conclusion.

In another embodiment, one or more restriction enzymes which are chosen to yield restriction fragments of up to about 50 base pairs, of up to about 100 base pairs, of up to about 150 base pairs, of up to about 200 base pairs, or of up to about 250 base pairs that are amenable to molecular mass analysis.

In another embodiment, the molecular masses of all or most (i.e., about 75%, about 80%, about 90% about 99% or every fragment minus one fragment) of the restriction fragments are then measured and the results are compared with the results calculated for the theoretical restriction digestions of all of the entries in the relational database.

In some embodiments, the amplifying step is accomplished by using the polymerase chain reaction and a polymerase chain reaction is catalyzed by a polymerase enzyme whose function is modified relative to a native polymerase. In some embodiments the modified polymerase enzyme is exo(−) Pfu polymerase which catalyzes the addition of nucleotide residues to staggered restriction digest products to convert the staggered digest products to blunt-ended digest products.

Although the use of PCR is suitable, other nucleic acid amplification techniques may also be used, including ligase chain reaction (LCR) and strand displacement amplification (SDA).

Mass spectrometry (MS)-based detection of PCR products provides a means for determination of BCS which has several advantages. MS is intrinsically a parallel detection scheme without the need for radioactive or fluorescent labels, since every amplification product is identified by its molecular mass. The current state of the art in mass spectrometry is such that less than femtomole quantities of material can be readily analyzed to afford information about the molecular contents of the sample. An accurate assessment of the molecular mass of the material can be quickly obtained, irrespective of whether the molecular weight of the sample is several hundred, or in excess of one hundred thousand atomic mass units (amu) or Daltons. Intact molecular ions can be generated from amplification products using one of a variety of ionization techniques to convert the sample to gas phase. These ionization methods include, but are not limited to, electrospray ionization (ES), matrix-assisted laser desorption ionization (MALDI) and fast atom bombardment (FAB). For example, MALDI of nucleic acids, along with examples of matrices for use in MALDI of nucleic acids, are described in WO 98/54751. The accurate measurement of molecular mass for large DNAs is limited by the adduction of cations from the PCR reaction to each strand, resolution of the isotopic peaks from natural abundance ¹³C and ¹⁵N isotopes, and assignment of the charge state for any ion. The cations are removed by in-line dialysis using a flow-through chip that brings the solution containing the PCR products into contact with a solution containing ammonium acetate in the presence of an electric field gradient orthogonal to the flow. The latter two problems are addressed by operating with a resolving power of >100,000 and by incorporating isotopically depleted nucleotide triphosphates into the DNA. The resolving power of the instrument is also a consideration. At a resolving power of 10,000, the modeled signal from the [M-14H+]¹⁴⁻ charge state of an 84mer PCR product is poorly characterized and assignment of the charge state or exact mass is impossible. At a resolving power of 33,000, the peaks from the individual isotopic components are visible. At a resolving power of 100,000, the isotopic peaks are resolved to the baseline and assignment of the charge state for the ion is straightforward. The [¹³C,¹⁵N]-depleted triphosphates are obtained, for example, by growing microorganisms on depleted media and harvesting the nucleotides (Batey et al., Nucl. Acids Res., 1992, 20, 4515-4523).

While mass measurements of intact nucleic acid regions are believed to be adequate, tandem mass spectrometry (MS^(n)) techniques may provide more definitive information pertaining to molecular identity or sequence. Tandem MS involves the coupled use of two or more stages of mass analysis where both the separation and detection steps are based on mass spectrometry. The first stage is used to select an ion or component of a sample from which further structural information is to be obtained. The selected ion is then fragmented using, e.g., blackbody irradiation, infrared multiphoton dissociation, or collisional activation. For example, ions generated by electrospray ionization (ESI) can be fragmented using IR multiphoton dissociation. This activation leads to dissociation of glycosidic bonds and the phosphate backbone, producing two series of fragment ions, called the w-series (having an intact 3′ terminus and a 5′ phosphate following internal cleavage) and the a-Base series (having an intact 5′ terminus and a 3′ furan).

The second stage of mass analysis is then used to detect and measure the mass of these resulting fragments of product ions. Such ion selection followed by fragmentation routines can be performed multiple times so as to essentially completely dissect the molecular sequence of a sample.

If there are two or more targets of similar molecular mass, or if a single amplification reaction results in a product which has the same mass as two or more reference standards, they can be distinguished by using mass-modifying “tags.” In this embodiment of the invention, a nucleotide analog or “tag” is incorporated during amplification (e.g., a 5-(trifluoromethyl) deoxythymidine triphosphate) which has a different molecular weight than the unmodified base so as to improve distinction of masses. Such tags are described in, for example, PCT WO97/33000, which is incorporated herein by reference in its entirety. This further limits the number of possible base compositions consistent with any mass. For example, 5-(trifluoromethyl)deoxythymidine triphosphate can be used in place of dTTP in a separate nucleic acid amplification reaction. Measurement of the mass shift between a conventional amplification product and the tagged product is used to quantitate the number of thymidine nucleotides in each of the single strands. Because the strands are complementary, the number of adenosine nucleotides in each strand is also determined.

In another amplification reaction, the number of G and C residues in each strand is determined using, for example, the cytidine analog 5-methylcytosine (5-meC) or 5-prolynylcytosine. propyne C. The combination of the A/T reaction and G/C reaction, followed by molecular weight determination, provides a unique base composition. This method is summarized in FIG. 1 and Table 1.

TABLE 1 Total Total Total Base Base base base □mass info info comp. comp. Double strand Single strand this this other Top Bottom Mass tag sequence Sequence strand strand strand strand strand T* □ T*ACGT*ACGT* T*ACGT*ACGT* 3x 3T 3A 3T 3A (T*-T) = x AT*GCAT*GCA 2A 2T 2C 2G 2G 2C AT*GCAT*GCA 2x 2T 2A C* □ TAC*GTAC*GT TAC*GTAC*GT 2x 2C 2G (C*-C) = y ATGC*ATGC*A ATGC*ATGC*A 2x 2C 2G

The mass tag phosphorothioate A (A*) was used to distinguish a Bacillus anthracis cluster. The B. anthracis (A₁₄G₉C₁₄T₉) had an average MW of 14072.26, and the B. anthracis (A₁A*₁₃G₉C₁₄T₉) had an average molecular weight of 14281.11 and the phosphorothioate A had an average molecular weight of +16.06 as determined by ESI-TOF MS. The deconvoluted spectra are shown in FIG. 2.

In another example, assume the measured molecular masses of each strand are 30,000.115Da and 31,000.115 Da respectively, and the measured number of dT and dA residues are (30, 28) and (28, 30). If the molecular mass is accurate to 100 ppm, there are 7 possible combinations of dG+dC possible for each strand. However, if the measured molecular mass is accurate to 10 ppm, there are only 2 combinations of dG+dC, and at 1 ppm accuracy there is only one possible base composition for each strand.

Signals from the mass spectrometer may be input to a maximum-likelihood detection and classification algorithm such as is widely used in radar signal processing. Processing may end with a Bayesian classifier using log likelihood ratios developed from the observed signals and average background levels. Background signal strengths are estimated and used along with the matched filters to form signatures which are then subtracted. the maximum likelihood process is applied to this “cleaned up” data in a similar manner employing matched filters and a running-sum estimate of the noise-covariance for the cleaned up data.

In some embodiments, the mitochondrial DNA analyzed is human mitochondrial DNA obtained from human saliva, hair, blood, or nail. In other embodiments, the DNA analyzed can be obtained from an animal, a fungus, a parasite or a protozoan.

The present invention also comprises primer pairs which are designed to bind to highly conserved sequence regions mitochondrial DNA that flank an intervening variable region such as the variable sections found within regions HV1, HV2, R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11 and R12 and yield amplification products which ideally provide enough variability to provide a forensic conclusion, and which are amenable to molecular mass analysis. By the term “highly conserved,” it is meant that the sequence regions exhibit from about 80 to 100%, or from about 90 to 100%, or from about 95 to 100% identity, or from about 80 to 99%, or from about 90 to 99%, or from about 95 to 99% identity. The molecular mass of a given amplification product provides a means of drawing a forensic conclusion due to the variability of the variable region. Thus, design of primers involves selection of a variable section with optimal variability in the mtDNA of different individuals.

In some embodiments, each member of the pair has at least 70%, at least 80%, at least 90%, at least 95%, or at least 99% sequence identity with the sequence of the corresponding member of any one or more of the following primer pair sequences: SEQ ID NOs: 8:9, 10:11, 12:13, 12:14, 12:15, 16:17, 18:19, 20:21, 22:23, 24:25, 26:27, 28:29, 30:31, 32:33, 34:35, 36:37, 38:39, 40:41, 42:43, 44:45, 42:46, 47:48, 18:49, 50:51, 22:52, 53:54, 55:56, 57:29, 58:31, 59:60, 61:62, 63:39, 40:64, 65:66, 67:68, 69:70, 12:68, 12:70, 67:15, 71:70, 69:15, and 69:68.

In some embodiments, the region of mitochondrial DNA comprises HV1, each member of the primer pair has at least 70% sequence identity with the sequence of the corresponding member of any one of the following primer pair sequences: SEQ ID NOs: 12:13, 12:14, 12:15, 16: 17, 42:43, 42:46, 67:68, 69:70, 12:68, 12:70, 67:15, 71:70, 69:15, or 69:68, and the restriction enzyme is RsaI.

In some embodiments, the region of mitochondrial DNA comprises HV2, each member of the primer pair has at least 70% sequence identity with the sequence of the corresponding member of any one of the following primer pair sequences: SEQ ID NOs: 8:9, 10:11, 16:17, or 65:66, and the at least one restriction enzyme is HaeIII, HpaII, MfeI, or SspI, or HpaII, HpyCH4IV, PacI, or EaeI.

In some embodiments, the region of mitochondrial DNA comprises region R1, each member of the primer pair has at least 70% sequence identity with the sequence of the corresponding member of any one of the following primer pair sequences: SEQ ID NOs: 18:19 and 18:49, at least one restriction enzyme is DdeI, MseI, HaeIII, or MboI.

In some embodiments, the region of mitochondrial DNA comprises region R2, each member of the primer pair has at least 70% sequence identity with the sequence of the corresponding member of any one of the following primer pair sequences: SEQ ID NOs: 20:21 and 50:51, and at least one restriction enzyme is DdeI, HaeIII, MboI, or MseI.

In some embodiments, the region of mitochondrial DNA comprises region R3, each member of the primer pair has at least 70% sequence identity with the sequence of the corresponding member of any one of the following primer pair sequences: SEQ ID NOs: 22:23 and 22:52, and at least one restriction enzyme is DdeI, MseI, MboI, or BanI.

In some embodiments, the region of mitochondrial DNA comprises region R4, each member of the primer pair has at least 70% sequence identity with the sequence of the corresponding member of any one of the following primer pair sequences: SEQ ID NOs: 24:25 and 53:54, and at least one restriction enzyme is DdeI, HpyCH4IV, MseI, or HaeIII.

In some embodiments, the region of mitochondrial DNA comprises region R5, each member of said primer pair has at least 70% sequence identity with the sequence of the corresponding member of any one of the following primer pair sequences: SEQ ID NOs: 26:27 and 55:56, and at least one restriction enzyme is AluI, BfaI, or MseI.

In some embodiments, the region of mitochondrial DNA comprises region R6, each member of the primer pair has at least 70% sequence identity with the sequence of the corresponding member of any one of the following primer pair sequences: SEQ ID NOs: 28:29 and 57:29, and at least one restriction enzyme is DdeI, HaeIII, MboI, MseI, or RsaI.

In some embodiments, the region of mitochondrial DNA comprises region R7, each member of said primer pair has at least 70% sequence identity with the sequence of the corresponding member of any one of the following primer pair sequences: SEQ ID NOs: 30:31 and 58:31, and at least one restriction enzyme is DdeI, HpaII, HaeIII, or MseI.

In some embodiments, the region of mitochondrial DNA comprises region R8, each member of the primer pair has at least 70% sequence identity with the sequence of the corresponding member of any one of the following primer pair sequences: SEQ ID NOs: 32:33 and 59:60, and at least one restriction enzyme is BfaI, DdeI, EcoRI, or MboI.

In some embodiments, the region of mitochondrial DNA comprises region R9, each member of the primer pair has at least 70% sequence identity with the sequence of the corresponding member of the following primer pair sequences: SEQ ID NOs: 34:35, and at least one restriction enzyme is BfaI, DdeI, HpaII, HpyCH4IV, or MboI.

In some embodiments, the region of mitochondrial DNA comprises region R10, each member of the primer pair has at least 70% sequence identity with the sequence of the corresponding member of the following primer pair sequences: SEQ ID NOs: 34:35, and at least one restriction enzyme is BfaI, HpaII, or MboI.

In some embodiments, the region of mitochondrial DNA comprises region R10, each member of the primer pair has at least 70% sequence identity with the sequence of the corresponding member of the following primer pair sequences: SEQ ID NOs: 36:37 and 61:62, and at least one restriction enzyme is BfaI, HpaII, or MboI.

In some embodiments, the region of mitochondrial DNA comprises region R11, each member of the primer pair has at least 70% sequence identity with the sequence of the corresponding member of the following primer pair sequences: SEQ ID NOs: 38:39 and 63:39, and at least one restriction enzyme is BfaI, DdeI, HpyCH4V, or MboI.

In some embodiments, the region of mitochondrial DNA comprises region R12, each member of the primer pair has at least 70% sequence identity with the sequence of the corresponding member of the following primer pair sequences: SEQ ID NOs: 40:41 and 40:64, and at least one restriction enzyme is BfaI, DdeI, or MseI.

Ideally, primer hybridization sites are highly conserved in order to facilitate the hybridization of the primer. In cases where primer hybridization is less efficient due to lower levels of conservation of sequence, the primers of the present invention can be chemically modified to improve the efficiency of hybridization. For example, because any variation (due to codon wobble in the 3^(rd) position) in these conserved regions among species is likely to occur in the third position of a DNA triplet, oligonucleotide primers can be designed such that the nucleotide corresponding to this position is a base which can bind to more than one nucleotide, referred to herein as a “universal base.” For example, under this “wobble” pairing, inosine (I) binds to U, C or A; guanine (G) binds to U or C, and uridine (U) binds to U or C. Other examples of universal bases include nitroindoles such as 5-nitroindole or 3-nitropyrrole (Loakes et al., Nucleosides and Nucleotides, 1995, 14, 1001-1003), the degenerate nucleotides dP or dK (Hill et al.), an acyclic nucleoside analog containing 5-nitroindazole (Van Aerschot et al., Nucleosides and Nucleotides, 1995, 14, 1053-1056) or the purine analog 1-(2-deoxy-β-D-ribofuranosyl)-imidazole-4-carboxamide (Sala et al., Nucl. Acids Res., 1996, 24, 3302-3306).

In another embodiment of the invention, to compensate for the somewhat weaker binding by the “wobble” base, the oligonucleotide primers are designed such that the first and second positions of each triplet are occupied by nucleotide analogs which bind with greater affinity than the unmodified nucleotide. Examples of these analogs include, but are not limited to, 2,6-diaminopurine which binds to thymine, propyne T (5-propynyluridine) which binds to adenine and propyne C (5-propynylcytidine) and phenoxazines, including G-clamp, which binds to G. Propynylated pyrimidines are described in U.S. Pat. Nos. 5,645,985, 5,830,653 and 5,484,908, each of which is commonly owned and incorporated herein by reference in its entirety. Propynylated primers are claimed in U.S. Ser. No. 10/294,203 which is also commonly owned and incorporated herein by reference in entirety. Phenoxazines are described in U.S. Pat. Nos. 5,502,177, 5,763,588, and 6,005,096, each of which is incorporated herein by reference in its entirety. G-clamps are described in U.S. Pat. Nos. 6,007,992 and 6,028,183, each of which is incorporated herein by reference in its entirety. Thus, In other embodiments, the primer pair has at least one modified nucleobase such as 5-propynylcytidine or 5-propynyluridine.

The present invention also comprises isolated mitochondrial DNA amplicons which are produced by the process of amplification of a sample of mitochondrial DNA with any of the above-mentioned primers.

While the present invention has been described with specificity in accordance with certain of its embodiments, the following examples serve only to illustrate the invention and are not intended to limit the same.

EXAMPLES Example 1 Nucleic Acid Isolation and Amplification

General Genomic DNA Sample Prep Protocol:

Raw samples were filtered using Supor-200 0.2 μm membrane syringe filters (VWR International). Samples were transferred to 1.5 ml eppendorf tubes pre-filled with 0.45 g of 0.7 mm Zirconia beads followed by the addition of 350 μl of ATL buffer (Qiagen, Valencia, Calif.). The samples were subjected to bead beating for 10 minutes at a frequency of 19 l/s in a Retsch Vibration Mill (Retsch). After centrifugation, samples were transferred to an S-block plate (Qiagen, Valencia, Calif.) and DNA isolation was completed with a BioRobot 8000 nucleic acid isolation robot (Qiagen, Valencia, Calif.).

Isolation of Blood DNA—

Blood DNA was isolated using an MDx Biorobot according to according to the manufacturer's recommended procedure (Isolation of blood DNA on Qiagen QIAamp® DNA Blood BioRobot® MDx Kit, Qiagen, Valencia, Calif.)

Isolation of Buccal Swab DNA—

Since the manufacturer does not support a full robotic swab protocol, the blood DNA isolation protocol was employed after each swab was first suspended in 400 ml PBS+400 ml Qiagen AL buffer+20 μl Qiagen Protease solution in 14 ml round-bottom falcon tubes, which were then loaded into the tube holders on the MDx robot.

Isolation of DNA from Nails and Hairs—

The following procedure employs a Qiagen DNeasy® tissue kit and represents a modification of the manufacturer's suggested procedure: hairs or nails were cut into small segments with sterile scissors or razorblades and placed in a centrifuge tube to which was added 1 ml of sonication wash buffer (10 mM TRIS-Cl, pH 8.0+10 mM EDTA+0.5% Tween-20. The solution was sonicated for 20 minutes to dislodge debris and then washed 2× with 1 ml ultrapure double deionized water before addition of 100 μl of Buffer X1 (10 mM TRIS-Cl, ph 8.0+10 mM EDTA+100 mM NaCl+40 mM DTT+2% SDS+250:g/ml Qiagen proteinase K). The sample was then incubated at 55° C. for 1-2 hours, after which 200 μl of Qiagen AL buffer and 210 μl isopropanol were added and the solution was mixed by vortexing. The sample was then added to a Qiagen DNeasy mini spin column placed in a 2 ml collection tube and centrifuged for 1 min at 6000 g (8000 rpm). Collection tube and flow-through were discarded. The spin column was transferred to a new collection tube and 500 μl of buffer AW2 was added before centrifuging for 3 min. at 20,000 g (14,000 rpm) to dry the membrane. For elution, 50-100 μl of buffer AE was pipetted directly onto the DNeasy membrane and eluted by centrifugation (6000 g-8000 rpm) after incubation at room temperature for 1 min.

Amplification by PCR—

An exemplary PCR procedure for amplification of mitochondrial DNA is the following: A 50 μl total volume reaction mixture contained 1× GenAmp® PCR buffer II (Applied Biosystems)—10 mM TRIS-Cl, pH 8.3 and 50 mM KCl, 1.5 mM MgCl₂, 400 mM betaine, 200 μM of each dNTP (Stratagene 200415), 250 nM of each primer, and 2.5-5 units of Pfu exo(−) polymerase Gold (Stratagene 600163) and at least 50 pg of template DNA. All PCR solution mixing was performed under a HEPA-filtered positive pressure PCR hood. An example of a programmable PCR cycling profile is as follows: 95° C. for 10 minutes, followed by 8 cycles of 95° C. for 20 sec, 62° C. for 20 sec, and 72° C. for 30 sec—wherein the 62° C. annealing step is decreased by 1° C. on each successive cycle of the 8 cycles, followed by 28 cycles of 95° C. for 20 sec, 55° C. for 20 sec, and 72° C. for 30 sec, followed by holding at 4° C. Development and optimization of PCR reactions is routine to one with ordinary skill in the art and can be accomplished without undue experimentation.

Example 2 Digestion of Amplicons with Restriction Enzymes

Reaction Conditions—

The standard restriction digest reaction conditions outlined herein are applicable to all panels of restriction enzymes. The PCR reaction mixture is diluted into 2×NEB buffer 1+BSA and 1 μl of each enzyme per 50 μl of reaction mixture is added. The mixture is incubated at 37° C. for 1 hour followed by 72° C. for 15 minutes. Restriction digest enzyme panels for HV1, HV2 and twelve additional regions of mitochondrial DNA are indicated in Table 2.

TABLE 2 mtDNA Regions, Coordinates and Restriction Enzyme Digest Panels COORDINATES RELATIVE TO THE ANDERSON RESTRICTION SEQUENCE ENZYME mtDNA REGION (SEQ ID NO: 72) PANEL HV1 (highly variable 16050-16410 RsaI control region 1) HV2 (highly variable  29-429 HaeIII HpaII MfeI control region 2) SspI or HpaII, HpyCH4IV, PacI and EaeI REGION R1 (COX2, 8162-8992 DdeI MseI HaeIII Intergenic spacer, MboI tRNA-Lys, ATP6) REGION R2 (ND5) 12438-13189 DdeI HaeIII MboI MseI REGION R3 (ND6 14629-15414 DdeI MseI MboI tRNA-Glu, CYTB) BanI REGION R4 (COX3, 9435-9461 DdeI HpyCH4IV MseI tRNA-Gly, ND3) HaeIII REGION R5 (ND4L, ND4) 10753-11500 AluI BfaI MseI REGION R6 (CYTB, 15378-16006 DdeI HaeIII MboI tRNA-Thr, tRNA-Pro) MseI RsaI REGION R7 (ND5, ND6) 13424-14206 DdeI HpaII HaeIII MseI REGION R8 (ND1) 3452-4210 BfaI DdeI EcoRI MboI REGION R9 (COX2, 7734-8493 BfaI DdeI HpaII Intergenic spacer, HpyCH4IV MboI tRNA-Lys, ATP6) REGION R10 (COX1) 6309-7058 BfaI HpaII MboI REGION R11 (COX2, 7644-8371 BfaI DdeI HpyCH4V Intergenic spacer, MboI tRNA-Lys, ATP6) REGION R12 2626-3377 BfaI DdeI MseI (16S rRNA; ND1)

Example 3 Nucleic Acid Purification

Procedure for Semi-Automated Purification of a PCR Mixture Using Commercially Available ZipTips®—

As described by Jiang and Hofstadler (Y. Jiang and S. A. Hofstadler Anal. Biochem. 2003, 316, 50-57) an amplified nucleic acid mixture can be purified by commercially available pipette tips containing anion exchange resin. For pre-treatment of ZipTips® AX (Millipore Corp. Bedford, Mass.), the following steps were programmed to be performed by an Evolution™ P3 liquid handler (Perkin Elmer) with fluids being drawn from stock solutions in individual wells of a 96-well plate (Marshall Bioscience): loading of a rack of ZipTips®AX; washing of ZipTips®AX with 15 μl of 10% NH₄OH/50% methanol; washing of ZipTips® AX with 15 μl of water 8 times; washing of ZipTips® AX with 15 μl of 100 mM NH₄OAc.

For purification of a PCR mixture, 20 μl of crude PCR product was transferred to individual wells of a MJ Research plate using a BioHit (Helsinki, Finland) multichannel pipette. Individual wells of a 96-well plate were filled with 300 μl of 40 mM NH₄HCO₃. Individual wells of a 96-well plate were filled with 300 μl of 20% methanol. An MJ research plate was filled with 10 μl of 4% NH₄OH. Two reservoirs were filled with deionized water. All plates and reservoirs were placed on the deck of the Evolution P3 (EP3) (Perkin-Elmer, Boston, Mass.) pipetting station in pre-arranged order. The following steps were programmed to be performed by an Evolution P3 pipetting station: aspiration of 20 μl of air into the EP3 P50 head; loading of a pre-treated rack of ZipTips® AX into the EP3 P50 head; dispensation of the 20 μl NH₄HCO₃ from the ZipTips® AX; loading of the PCR product into the ZipTips® AX by aspiration/dispensation of the PCR solution 18 times; washing of the ZipTips® AX containing bound nucleic acids with 15 μl of 40 mM NH₄ HCO₃ 8 times; washing of the ZipTips® AX containing bound nucleic acids with 15 μl of 20% methanol 24 times; elution of the purified nucleic acids from the ZipTips® AX by aspiration/dispensation with 15 μl of 4% NH₄OH 18 times. For final preparation for analysis by ESI-MS, each sample was diluted 1:1 by volume with 70% methanol containing 50 mM piperidine and 50 mM imidazole.

Procedure for Semi-Automated Purification of a PCR mixture with Solution Capture—

The following procedure is disclosed in a U.S. patent application filed on May 12, 2004, (Attorney Docket No. IBIS0026-100): for pre-treatment of ProPac® WAX weak anion exchange resin, the following steps were performed in bulk: sequential washing three times (10:1 volume ratio of buffer to resin) with each of the following solutions: (1) 1.0 M formic acid/50% methanol, (2) 20% methanol, (3) 10% NH₄OH, (4) 20% methanol, (5) 40 mM NH₄HCO₃, and (6) 100 mM NH₄OAc. The resin is stored in 20 mM NH₄OAc/50% methanol at 4° C.

Corning 384-well glass fiber filter plates were pre-treated with two rinses of 250 μl NH₄OH and two rinses of 100 μl NH₄HCO₃.

For binding of the PCR product nucleic acids to the resin, the following steps were programmed to be performed by the Evolution™ P3 liquid handler: addition of 0.05 to 10 μl of pre-treated ProPac® WAX weak anion exchange resin (30 μl of a 1:60 dilution) to a 50 μl PCR reaction mixture (80 μl total volume) in a 96-well plate; mixing of the solution by aspiration/dispensation for 2.5 minutes; and transfer of the solution to a pre-treated Corning 384-well glass fiber filter plate. This step was followed by centrifugation to remove liquid from the resin and is performed manually, or under the control of a robotic arm.

The resin containing nucleic acids was then washed by rinsing three times with 200 μl of 100 mM NH₄OAc, 200 μl of 40 mM NH₄HCO₃ with removal of buffer by centrifugation for about 15 seconds followed by rinsing three times with 20% methanol for about 15 seconds. The final rinse was followed by an extended centrifugation step (1-2 minutes). Elution of the nucleic acids from the resin was accomplished by addition of 40 μl elution/electrospray buffer (25 mM piperidine/25 mM imidazole/35% methanol and 50 nM of an internal standard oligonucleotide for calibration of mass spectrometry signals) followed by elution from the 384-well filter plate into a 384-well catch plate by centrifugation. The eluted nucleic acids in this condition were amenable to analysis by ESI-MS. The time required for purification of samples in a single 96-well plate using a liquid handler is approximately five minutes.

Example 4 Mass Spectrometry

The mass spectrometer used is a Bruker Daltonics (Billerica, Mass.) Apex II 70e electrospray ionization Fourier transform ion cyclotron resonance mass spectrometer (ESI-FTICR-MS) that employs an actively shielded 7 Tesla superconducting magnet. All aspects of pulse sequence control and data acquisition were performed on a 1.1 GHz Pentium II data station running Broker's Xmass software. 20 μl sample aliquots were extracted directly from 96-well microtiter plates using a CTC HTS PAL autosampler (LEAP Technologies, Carrboro, N.C.) triggered by the data station. Samples were injected directly into the ESI source at a flow rate of 75 μL/hr. Ions were formed via electrospray ionization in a modified Analytica (Branford, Conn.) source employing an off axis, grounded electrospray probe positioned ca. 1.5 cm from the metalized terminus of a glass desolvation capillary. The atmospheric pressure end of the glass capillary is biased at 6000 V relative to the ESI needle during data acquisition. A counter-current flow of dry N₂/O₂ was employed to assist in the desolvation process. Ions were accumulated in an external ion reservoir comprised of an rf-only hexapole, a skimmer cone, and an auxiliary gate electrode, prior to injection into the trapped ion cell where they were mass analyzed.

Spectral acquisition was performed in the continuous duty cycle mode whereby ions were accumulated in the hexapole ion reservoir simultaneously with ion detection in the trapped ion cell. Following a 1.2 ms transfer event, in which ions were transferred to the trapped ion cell, the ions were subjected to a 1.6 ms chirp excitation corresponding to 8000-500 m/z. Data was acquired over an m/z range of 500-5000 (1M data points over a 225K Hz bandwidth). Each spectrum was the result of co-adding 32 transients. Transients were zero-filled once prior to the magnitude mode Fourier transform and post calibration using the internal mass standard. The ICR-2LS software package (G. A. Anderson, J. E. Bruce (Pacific Northwest National Laboratory, Richland, Wash., 1995) was used to deconvolute the mass spectra and calculate the mass of the monoisotopic species using an “averaging” fitting routine (M. W. Senko, S. C. Beu, F. W. McLafferty, J. Am. Soc. Mass Spectrom. 1995, 6, 229) modified for DNA. Using this approach, monoisotopic molecular weights were calculated.

Example 5 Primer Pairs for Amplification of Informative Regions of Mitochondrial DNA

Conventional forensic mitochondrial DNA analysis typically involves amplification and sequencing of the two hypervariable regions within the non-coding control region known as HV1 and HV2. The present invention comprises primer pairs for amplification of informative regions within HV1 and HV2 (SEQ ID NOs: 8-17, 42-48 and 65-71 in Table 3). Additional individual discriminating power has been obtained by the selection for analysis of 12 additional non-control regions (Regions R1-R12) from which informative amplification products of approximately 630-840 bp each can be obtained using additional primer pairs (SEQ ID NOs: 18-41 and 49-70 in Table 3). The primers listed below in Table 3 are generally 10-50 nucleotides in length, 15-35 nucleotides in length, or 18-30 nucleotides in length.

By convention, human mtDNA sequences are described using the first complete and published mtDNA sequence as a reference (Anderson, S. et al., Nature, 1981, 290, 457-465). This sequence is commonly referred to as the Anderson sequence. Primer pair names on Table 3 indicate the mtDNA amplicon coordinates with reference to the Anderson mtDNA sequence: GenBank Accession No. NC_(—)001807.3 (SEQ ID NO: 72). For example, primer pairs 8:9 produce an amplicon which corresponds to positions 76-353 of the Anderson sequence.

TABLE 3 Primer Pairs for Analysis of mtDNA FORWARD REVERSE REVERSE PRIMER PAIR mtDNA REGION FORWARD PRIMER SEQ ID PRIMER SEQ ID NAME AMPLIFIED SEQUENCE NO: SEQUENCE NO: HMTHV2_ANDRSN_7 REGION HV2 tcacgcgatagcatt  8 tggtttggcagag  9 6_353_TMOD gcg atgtgtttaagt HMTHV2_ANDRSN_2 REGION HV2 tctcacgggagctct 10 tctgttaaaagtg 11 9 429 TMOD ccatgc cataccgcca HMTHV1_ANDRSN_1 REGION HV1 tgactcacccatcaa 12 tgaggatggtggt 13 6065 16410 TMOD caaccgc caagggac HMTHV1_ANDRSN_1 REGION HV1 tgactcacccatcaa 12 tggatttgactgt 14 6065 16354 TMOD caaccgc aatgtgcta HMTHV1_ANDRSN_1 REGION HV1 tgactcacccatcaa 12 tgaagggatttga 15 6064_16359 caaccgc ctgtaatgtgcta tg HMT_ASN_16036_5 REGION HV1 and gaagcagatttgggt 16 gtgtgtgtgctgg 17 22 REGION HV2 accacc gtaggatg HMT_ASN_8162_89 REGION R1 (COX2, tacggtcaatgctct 18 tggtaagaagtgg 19 16 Intergenic spacer, gaaatctgtgg gctagggcatt tRNA-Lys, ATP6) HMT_ASN_12438_1 REGION R2 (ND5) ttatgtaaaatccat 20 tggtgatagcgcc 21 3189 tgtcgcatccacc taagcatagtg HMT_ASN_14629_1 REGION R3 (ND6  tcccattactaaacc 22 tttcgtgcaagaa 23 5353 tRNA-Glu, CYTB) cacactcaacag taggaggtggag HMT_ASN_9435_10 REGION R4 (COX3, taaggccttcgatac 24 tagggtcgaagcc 25 188 tRNA-Gly, ND3) gggataatccta gcactcg HMT_ASN_10753_1 REGION R5 (ND4L, tactccaatgctaaa 26 tgtgaggcgtatt 27 1500 ND4) actaatcgtcccaac ataccatagccg HMT_ASN_15369_1 REGION R6 (CYTB, tcctaggaatcacct 28 tagaatcttagct 29 6006 tRNA-Thr,  cccattccga ttgggtgctaatg tRNA-Pro) gtg HMT_ASN_13461_1 REGION R7  tggcagcctagcatt 30 tggctgaacattg 31 4206 (ND5, ND6) agcaggaata tttgttggtgt HMT_ASN_3452_42 REGION R8 (ND1) tcgctgacgccataa 32 taagtaatgctag 33 10 aactcttcac ggtgagtggtagg aag HMT_ASN_7734_84 REGION R9 (COX2, taactaatactaaca 34 tttatgggctttg 35 93 Intergenic spacer, tctcagacgctcagg gtgagggaggta tRNA-Lys, ATP6) a HMT_ASN_6309_70 REGION R10 (COX1) tactcccaccctgga 36 tgctcctattgat 37 58 gcctc aggacatagtgga agtg HMT_ASN_7644_83 REGION R11 (COX2, ttatcacctttcatg 38 tggcatttcactg 39 71 Intergenic spacer, atcacgccct taaagaggtgttg tRNA-Lys, ATP6) g HMT_ASN_2626_33 REGION R12 (16S tgtatgaatggctcc 40 tcggtaagcatta 41 77 rRNA; ND1) acgagggt ggaatgccattgc HMTHV1_ANDRSN_1 REGION HV1 gactcacccatcaac 42 gaggatggtggtc 43 6065 16410 aaccgc aagggac HMTHV2_ANDRSN_2 REGION HV2 ctcacgggagctctc 44 ctgttaaaagtgc 45 9_429 catgc ataccgcca HMTHV1_ANDRSN_1 REGION HV1 gactcacccatcaac 42 ggatttgactgta 46 6065_16354 aaccgc atgtgcta HMTHV2_ANDRSN_7 REGION HV2 cacgcgatagcattg 47 ggtttggcagaga 48 6 353 cg tgtgtttaagt HMT_ASN_8162_89 REGION R1 (COX2, tacggtcaatgctct 18 tggctattggttg 49 92 Intergenic spacer, gaaatctgtgg aatgagtaggctg tRNA-Lys, ATP6) HMT_ASN_12432_1 REGION R2 (ND5) tccccattatgtaaa 50 tgacttgaagtgg 51 3262 atccattgtcgc agaaggctacg HMT_ASN_14629_1 REGION R3 (ND6  tcccattactaaacc 22 taagggtggaagg 52 5414 tRNA-Glu, CYTB) cacactcaacag tgattttatcgga a HMT_ASN_9411_10 REGION R4 (COX3, tgccaccacacacca 53 tatagggtcgaag 54 190 tRNA-Gly, ND3) cctg ccgcactc HMT_ASN_10751_1 REGION R5 (ND4L, tctactccaatgcta 55 tggttgagaatga 56 1514 ND4) aaactaatcgtccc gtgtgaggcg HMT_ASN_15378_1 REGION R6 (CYTB, tcacctcccattccg 57 tagaatcttagct 29 6006 tRNA-Thr,  ataaaatcacct ttgggtgctaatg tRNA-Pro) gtg HMT_ASN_13424_1 REGION R7  tcaaaaccatacctc 58 tggctgaacattg 31 4206 (ND5, ND6) tcacttcaacctc tttgttggtgt HMT_ASN_3443_42 REGION R8 (ND1) tacaacccttcgctg 59 taagtaatgctag 60 10 2 acgccat ggtgagtggtagg aa HMT_ASN_6278_70 REGION R10 (COX1) ttgaacagtctaccc 61 tgtagtacgatgt 62 06 tcccttagc ctagtgatgagtt tgc HMT_ASN_7688_83 REGION R11 (COX2, tgcttcctagtcctg 63 tggcatttcactg 39 71 Intergenic spacer, tatgcccttttcc taaagaggtgttg tRNA-Lys, ATP6) g HMT_ASN_2626_34 REGION R12 (16S tgtatgaatggctcc 40 tggcgtcagcgaa 64 63 rRNA; ND1) acgagggt gggttgta HMTHV2_ASN_72_3 REGION HV2 tgtgcacgcgatagc 65 tggggtttggcag 66 57 attgcg agatgtgtttaag t HMTHV1_ASN_1605 REGION HV1 tcaagtattgactca 67 tcgagaagggatt 68 6 16362 cccatcaacaacc tgactgtaatgtg cta HMTHV1_ASN_1605 REGION HV1 taccacccaagtatt 69 tcatggggacgag 70 0_16370 gactcacccatc aagggatttgac HMTHV1_ASN_1606 REGION HV1 tgactcacccatcaa 12 tcgagaagggatt 68 4 16362 caaccgc tgactgtaatgtg cta HMTHV1_ASN_1606 REGION HV1 tgactcacccatcaa 12 tcatggggacgag 70 4_16370 caaccgc aagggatttgac HMTHV1_ASN_1605 REGION HV1 tcaagtattgactca 67 tgaagggatttga 15 6 16359 cccatcaacaacc ctgtaatgtgcta tg HMTHV1_ASN_1605 REGION HV1 tcaagtattgactca 71 tcatggggacgag 70 6 16370 cccatcaacaacc aagggatttgac HMTHV1_ASN_1605 REGION HV1 taccacccaagtatt 69 tgaagggatttga 15 0 16359 gactcacccatc ctgtaatgtgcta tg HMTHV1_ASN_1605 REGION HV1 taccacccaagtatt 69 tcgagaagggatt 68 0 16362 gactcacccatc tgactgtaatgtg cta

Example 6 Analysis of 10 Blinded DNA Samples

Ten different blinded samples of human DNA provided by the FBI were subjected to rapid mtDNA analysis by the method of the present invention according to the process illustrated in FIG. 3. After amplification of human mtDNA by PCR (210), the PCR products were subjected to restriction digestion (220) with RsaI for HV1 and a combination of HpaII, HpyCH4IV, PacI and EaeI for HV2 in order to obtain amplicon segments suitable for analysis by mass spectrometry (230). The data were processed to obtain mass data for each amplicon fragment (240) from which a “fragment coverage map” was generated (an example of a fragment coverage map is shown in FIG. 3—represented as a series of horizontal bars beneath the mass spectrum). The fragment coverage map was then compared, using a scoring scheme to fragment coverage maps calculated for theoretical digests from mtDNA sequences in the FBI mtDNA database (250).

A group of 10 blinded DNA samples was provided by the FBI. HV1 and HV2 primer pairs were selected from a sequence alignment created by translating the FBI's forensic mtDNA database back into full sequences via comparison to the Anderson reference, then selecting primers within the full representation core of the alignment and restriction enzymes that will cleave the 280 and 292 bp PCR products into mass spectrometry-compatible fragments. Primer pairs selected for amplification of HV1 segments were SEQ ID NOs: 12:14 and 42:43. Primer pairs selected for amplification of HV2 segments were SEQ ID NOs: 8:9 and 44:45 (Table 3). PCR amplification was carried out as indicated in Example 1, with the exception that 2 mM MgCl₂ was included instead of 1.5 mM MgCl₂, and that 4 units of Amplitaq Gold® polymerase (Applied Biosystems) was included instead of 2.5 units of Pfu exo(−) polymerase. 3 μl of FBI DNA sample were included in the reaction. Thermal cycler parameters were as follows: 96° C. for 10 min., followed by 45 cycles of the following: 96° C. for 30 sec, 54° C. for 30 sec., and 72° C. for 30 sec., after which the reaction was kept at 72° C. for 5 minutes.

Theoretical digestions of the 2754 unambiguous unique sequences contained within the 4840 FBI sequence entries (there are 399 sequences in the FBI database which contain at least one ambiguous base call within the amplified regions, leading to 4441 unambiguous sequences, 2754 of which are unique), with all possible products resulting from incomplete digestion, were performed and fragment start and end coordinates, base composition, mass, and end chemistry were stored in a data structure for subsequent fragment pattern reconstruction. A deconvolved list of monoisotopic exact mass determinations from ICR-2LS₁ was determined for each restriction digestion for each blinded sample. For each sample, expected digestion fragment masses were matched to observed masses with a threshold of ±4 ppm for each database entry (1 ppm match error is defined as a difference between observed and expected mass equal to one millionth of the expected mass).

To evaluate the ability of a single-pass MS-based assay to exclude known database entries as having base compositions that are different than that of an unknown sample, a scoring system was devised that, for a given input sample, assigns each database sequence a score relative to the highest scoring sequence. To evaluate whether base composition of mtDNA fragments can achieve a discrimination power approaching that of sequencing, the ten blinded samples of human DNA from the FBI were analyzed. The overall consistency of the observed digestion products with the expected fragment pattern for each of the 4840 database entries was scored using the sum of four values: 1.) The total number of observed masses accounted for in the expected fragment list, 2.) The percentage of expected fragments observed for a complete digestion 3.) A “floating percentage” of expected fragments matched, where matches to incomplete digestion fragments were scored ½ percentage point and the total number of expected fragments was incremented by ½ for each observed incomplete digestion fragment, and 4.) The percentage of sequence positions accounted for by matches with observed masses. Scores for the HV1 and HV2 regions were summed to produce a total score for each entry. Database entries were sorted by high score and assigned a final score as a percentage of the top score. An arbitrary (but conservative) scoring threshold of 80% of the top score was set to produce a very conservative lower bound on the percentage of database entries that could be excluded as consistent with each sample.

Without knowing the true sequence of the initial ten samples and allowing for slight experimental variations in restriction digestions and mass spectrometry, comparison to a large collection of database entries enabled exclusion of a vast majority of entries in the database. Table 4 shows an example of the scoring output for one sample (sample 4) and summarizes the exclusion percentages for each of the blinded samples for a set of reactions run side-by-side on a single day. The HV1 and HV2 regions of each sample were sequenced following the analysis described in this work for final verification. Table 4 summaries the overall results of this exercise for this preliminary data analysis.

TABLE 4 Scoring of FBI Sample 3 Against the FBI Mitochondrial DNA Database Number of % of % of Floating Database Entry Sequences Sequence Fragment Fragment Match Cumulative % Match Row Title Represented Covered Covered Covered Score Score Score 1 AUT.CAU.000066|USA. 6 99.655 51.04 63.18 32.5 333.89 100 CAU.000389|USA.CAU. 000572|USA.CAU.000841| USA. CAU.001074|USA.CAU. 001211 2 USA.CAU.000101 1 90.92 47.02 57.005 24.5 300.38 89.9638 3 USA.CAU.000783 1 90.75 44.79 56.37 27 298.08 89.2749 4 USA.CAU.000130 1 88.18 46.53 56.68 27.5 296.92 88.9275 5 USA.CAU.000142 1 88.18 46.53 56.68 27.5 296.92 88.9275 6 FRA.CAU.000087|GRC. 7 92.765 42.71 51.86 25 295.95 88.637 CAU.000032|USA.CAU. 000425|USA.CAU.000483| USA.CAU.000772|USA. CAU.001067|USA.CAU. 001168 44 USA.HIS.000672 1 84.52 40.555 46.43 18 268.15 80.3109 45 FRA.CAU.000108|USA. 2 92.055 33.035 42.22 17.5 267.68 80.1701 CAU.000890 46 USA.CAU.000361|USA. 2 92.055 33.035 42.22 17.5 267.68 80.1701 CAU.001184 47 USA.CAU.001378|USA. 2 92.055 33.035 42.22 17.5 267.68 80.1701 CAU.001382 48 CHN.ASN.000443 1 88.525 34.03 43.135 22 267.11 79.9994 49 USA.CAU.000548 1 83.385 39.58 47.795 21 266.93 79.9455 50 USA.CAU.000814 1 83.385 39.58 47.795 21 266.93 79.9455 51 USA.CAU.000338|USA. 3 99.655 24.7 36.37 17 265.71 79.5801 CAU.000580|USA. CAU.001139 2750 USA.AFR.000947 1 20.205 0 4.285 3 43.41 13.0013 2751 USA.AFR.000558 1 8.735 5.555 10 6 34.58 10.3567 2752 SKE.AFR.000107 1 5.495 8.335 8.335 2 29.66 8.88317 2753 USA.AFR.000440 1 5.495 8.335 8.335 2 29.66 8.88317 2754 EGY.AFR.000021 1 11.475 0 1.515 1 23.95 7.17302

Table 4 illustrates the example of scoring sample 3 against the mtDNA database of 4441 entries (4840 original FBI mtDNA entries minus the 399 sequences containing ambiguous base calls). The total combined score for the HV1 and HV2 regions is shown in the column entitled “cumulative score”. All entries are given a score relative to the highest cumulative score in the column “% max score”. Database entry titles are in the column “DB entries.” Sequences whose HV1 and HV2 PCR products are identical are grouped into bins, with entry titles separated by vertical lines. The cut-off point for this exercise was defined as 80% of the top cumulative score. The two bins that define this boundary are rows 47 and 48. The total number of database entries that fall below this threshold is 4347, or 97.9%.

Identification codes used in Table 4 are from the mtDNA population database (Miller K W, Budowle B. Croat. Med. J. 2001, 42(3), 315-27). AFR: African; CAU: Caucasian; ASN: Asian; CHN: Chinese; HIS: Hispanic; AUT: Austrian; EGY: Egypt; FRA: France; GRC: Greece; SKE: Sierra Leone.

Example 7 Optimization of Amplification Conditions and Reagents for Efficient Data Processing and Pattern Matching

Forensic analysis of human mtDNA by mass spectrometry presents a number of challenges. First, PCR amplification reactions may result in non-templated additions of adenosine to the 3′-end of the template. When this occurs, mass spectrum signals become mixed and detection sensitivity is lowered. Second, the process of carrying out several purification steps to convert a PCR amplification mixture to appropriate specific buffer conditions required for specific restriction digests results in significant sample loss. Lastly, a significant subset of useful restriction endonuclease enzymes yield double-stranded digest products with staggered ends. This occurrence has the effect of complicating the process of restriction pattern analysis and limits the choice of restriction endonucleases to those that only generate blunt-ended digestion products.

These complications have been solved by the use of exo(−) Pfu polymerase (Stratagene, La Jolla, Calif.), a 3′-5′ exonuclease-deleted Pfu polymerase. The mass spectra of FIG. 4 indicate that the use of exo(−) Pfu polymerase prevents the addition of non-templated adenosine residues and 3′-end deletions which are normally observed when standard pfu polymerases are used. The resulting product exhibited a strong signal in the mass spectrum. On the other hand, use of the commonly used Amplitaq gold polymerase (Applied Biosystems) did not circumvent this problem (FIG. 4). An additional advantage obtained through the use of exo(−) Pfu polymerase is that there is no need for purification of the PCR product. The PCR product mixture can be easily modified with appropriate restriction enzyme activating buffer which is also compatible with the exo(−) Pfu polymerase.

A further additional advantage obtained from the avoidance of a purification procedure is that exo(−) Pfu polymerase remains viable throughout the subsequent restriction digest process and this remaining polymerase activity can be used to add leftover dNTPs to convert staggered restriction products to blunt-ended products by filling in the “missing” nucleotide residues.

Thus, crude PCR products are directly subjected to the restriction digestion process, minimizing time, sample handling and potential contamination. FIG. 5 indicates that exo(−) Pfu polymerase is effective for consistent amplification of mtDNA obtained from blood, fingernail and saliva samples. PCR conditions for this experiment were as follows: A 50 μl reaction volume contained the following: 10 mM TRIS-HCl, 50 mM KCl, MgCl₂, 200 μM deoxynucleotide triphosphates, 400 mM betaine, 200 nM primers, 4 units of Amplitaq Gold™ or 5 units exo(−) Pfu polymerase and mtDNA template and was subjected to incubation at 95° C. for 10 minutes first, then 35 cycles of the following thermal sequence: 95° C. for 20 seconds, 52° C. for 20 seconds, 72° C. for 30 seconds. Following the 35 cycles, the reaction was incubated at 72° C. for 4 minutes.

To take advantage of the modified function of the exo(−) Pfu polymerase, the experimental method was modified as follows: upon completion of amplification of mtDNA, restriction endonucleases were added to the amplification mixture which was then incubated for 1 hour at 37° C. The temperature of the mixture was then raised to 37° C. for 15 minutes to activate the exo(−) Pfu polymerase and enable the addition of nucleotides to staggered ends to produce the blunt ends which facilitate pattern analysis.

As discussed above, the ability of exo(−) Pfu polymerase provides the means of expanding the number of restriction endonucleases that are compatible with the present method and simplifying data processing by simplifying restriction digest patterns. Shown in FIG. 6 is the result of a comparison of digest patterns obtained when the originally chosen restriction enzymes EaeI and PacI are replaced with HaeIII and HpyCH4V. The pattern obtained using the newly chosen enzymes clearly results in a restriction digest pattern with better spacing of conserved restriction sites which facilitates analysis. Shown in FIG. 7 is the result of a gel electrophoresis analysis of the products of restriction digests. In this experiment a HV2 amplicon from a human mtDNA sample designated Seracare N31773. The mtDNA sample was amplified with Amplitaq Gold in 50 μl reaction volumes where 25 μl of PCR reaction was diluted up to 50 μl in: 1×NEB restriction buffer #1, 10 mM Bis-TRIS Propane-HCl, 10 mM MgCl₂, 1 mM DTT pH 7.0 (at 25° C.), 1×NEB BSA and (separately) 100 mg/μl in 1 μl volumes of each enzyme as follows: EaeI: 3 units; HpyCH4IV: 10 units; HpyCH4V: 5 units; HpaII: 10 units; PacI: 10 units; and HaeIII: 10 units. The mixtures were incubated for 1 hour at 37° C. before analysis in 4% agarose gel.

Restriction endonucleases MfeI and SspI are both useful alternatives to HpyCH4V and HpyCH4IV respectively, because they cleave at similar positions and cost significantly less than HpyCH4V and HpyCH4IV.

Example 8 Validation of Mitochondrial DNA Analysis Method: Analysis of Human Cheek Swab mtDNA Samples and Comparison with the mtDNA Population Database

Cheek swabs were obtained from 16 volunteer donors. Genomic DNA was isolated from the cheek swabs on a Qiagen MDx robot according to procedures outlined in Example 1. Final elution volumes were 160 μl for each well. 2 μl template was used in each PCR reaction which was run according to Example 1 except that the following cycling parameters were used: 95° C. for 10 minutes followed by 45 cycles of 95° C. for 20 sec, 52° C. for 20 sec and 72° C. for 30 sec, followed by holding at 72° C. for 4 minutes. Primer pairs used for HV1 were SEQ ID NOs: 12:15 and for HV2, SEQ ID NOs: 8:9.

PCR products (not shown) were digested with RsaI (HV1) or HaeIII, HpaII, HpyCH4IV, and HpyCH4V (HV2) according to the procedure outlined in Example 2.

Restriction digests were performed in duplicate with each duplicate swab, followed by mass determination of the amplicon fragments by mass spectrometry as described in Example 3. Samples were qualitatively scored for HV1 and HV2 against each unique database entry by the sum of:

a) the percentage of expected fragments observed in the mass spectrum;

b) the percentage of sequence positions covered by matched masses; and

c) the total number of observed mass peaks accounted for by matches to theoretical digest fragments.

Table 5 shows that, for the majority of the 16 samples, the ethnic designation of the majority of top-scoring entries from the FBI database coincide with the ethnic background of the donor. In general, mtDNA sequence data cannot be used to reliably associate a sample to the ethnic background of the donor, because the mitochondria follow the maternal line exclusively and ethnic mixing in populations increases as the general population becomes increasingly genetically integrated. However, as an overall assessment of the preliminary matching and scoring system, this association served well, because at the time of this evaluation, mtDNA samples had not been sequenced. Two outliers in the association of donor ethnic background and major ethnic backgrounds of top database scores were samples 2 and 16. Sample 2 was an African-American male with top database scores all designated “USA.CAU.xxx”. Upon inquiry, it was learned that this donor has a Caucasian mother. Because mtDNA is inherited maternally, the result appears valid.

TABLE 5 Results of Cheek Swab Comparison to the mtDNA Population Database Number % of % of % of % of Full of DB % of database database database database pattern entries database below below below below match in with below 95% of 90% of 85% of 80% of Ethnicity mtDNA highest highest highest highest highest highest closest Donor Donor database score score score score score score match Ethnicity 1 USA.AFR.000975 1 99.979 99.979 99.959 99.917 99.917 AFR Af. Amer. 2 USA.CAU.000191 3 99.938 99.731 99.153 96.054 92.169 CAU Af. Amer. USA.CAU.001303 With USA.CAU.001041 Cauc. Mother 3 None 1 99.979 99.938 99.917 99.566 98.905 CHN Chinese 4 AUT.CAU.000080 22 99.545 98.12 96.777 89.628 83.657 17 CAU Caucasian AUT.CAU.000090 4 HIS AUT.CAU.000099 2 AFR FRA.CAU.000041 18 more . . . 5 None 13 99.731 99.731 99.587 98.678 97.417 12 CAU Caucasian 1 AFR 6 None 1 99.979 99.793 99.442 97.438 94.38 CAU Caucasian 7 None 1 99.979 99.959 99.628 96.529 94.587 ASN Chinese 8 None 1 99.979 99.876 99.793 98.244 96.157 CAU Caucasian 9 USA.CAU.000031 1 99.979 99.979 99.979 99.256 98.574 CAU Caucasian 10 USA.CAU.000303 2 99.959 98.285 96.364 87.934 78.616 CAU Caucasian USA.CAU.000969 11 None 2 99.959 99.959 99.835 99.814 99.36 ASN Chinese 12 USA.CAU.000113 1 99.979 99.897 99.07 98.099 92.149 CAU Caucasian 13 CHN.ASN.000374 12 99.752 99.442 98.243 89.917 84.091 5 CAU Caucasian CHN.ASN.000411 3 ASN USA.335.000122 3 AFR GRC.CAU.000007 1 335 9 others . . . 14 USA.CAU.000297 1 99.979 99.979 99.917 99.649 95.806 CAU Caucasian 15 None 1 99.979 99.959 98.037 92.417 86.59 ASN Indian (India) 16 AUT.CAU.000096 12 99.752 99.669 98.863 95.971 84.7737 CAU Indian AUT.CAU.000100 (India) GRC.CAU.000011 USA.CAU.000604 4 others . . .

Identification codes used in Table 5 are from the mtDNA population database (Miller K W, Budowle B. Croat. Med. J. 2001, 42(3), 315-27). AFR: African; CAU: Caucasian; ASN: Asian; CHN: Chinese; HIS: Hispanic; AUT: Austrian; GRC: Greece. Code 335 (USA.335) in the donor 13 entry refers to the U.S. territory of Guam.

Example 9 Expanding Discriminating Power of the Mitochondrial DNA Analysis by Examination of Regions Outside of HV1 and HV2

Twelve regions of human mtDNA (referred to as R1-R12) were selected for investigation based upon a relatively large number of differences between individual entries in 524 non-control-region human mitochondrial sequences obtained from Mitokor, Inc. (San Diego, Calif.). The initial twelve primer pairs (see Table 3—SEQ ID NOs: 18:19, 20:21, 22:23, 24:25, 26:27, 28:29, 30:31, 32:33, 34:35, 36:37, 38:39, and 40:41) were tested upon ˜1.6 ng of human blood-derived DNA (Seracare blood sample N31773) which was isolated as indicated in Example 1.

The PCR protocol and cycling conditions are as described in Example 1 with the exception that 4 U of Amplitaq Gold polymerase (Applied Biosystems, Foster City, Calif.) was used. The results of the reactions are shown in FIG. 8 which indicates that reproducible amplicons were obtained for all twelve non-control regions investigated.

Initial digestions with enzyme panels outlined in Example 2 were employed, and coverage maps were assembled by matching observed masses at +4 ppm error to all sequences existing in the database as of Sep. 8, 2003-524 Mitokor-obtained sequences and 444 mtDNA genomes from GenBank.

The total number of unique sequences found within 968 predicted amplicon sequences from Mitokor and GenBank for each of the 12 non-control region primer pairs shows that the greatest number of different sequences is found within regions R1, R3, R6, R7 and R9 (Table 6). When amplicon sequences are concatenated together as collinear sequences, the combination of R1, R3, R6 and R7 comes out on top, with 508 unique base count signatures out of 968 sequences predicted for the combination R1+R3+R6+R7 compared to 475 unique signatures predicted for the combination R1+R3+R9+R7. It was thus decided that regions R1, R3, R6 and R7 provide the best discriminating power. The numbers of unique sequences for each of these regions are denoted by an asterisk in Table 6.

TABLE 6 Final Choices of Primers Optimized for Characterization of Non-Control Mitochondrial DNA Regions RESTRICTION FORWARD REVERSE NO. OF UNIQUE NO. OF mtDNA REGION ENZYME SEQ ID SEQ ID BASE UNIQUE REGION AMPLIFIED PANEL NO: NO: COMPOSITIONS SEQUENCES R1 COX2; DdeI MseI 18 49 182 204* Intergenic HaeIII MboI spacer; tRNA- Lys; ATP6 R2 ND5 DdeI HaeIII 20 21 106 132 MboI MseI R3 ND6, tRNA-Glu; DdeI MseI 22 52 135 170* CYTB MboI BanI R4 COX3; tRNA-Gly; DdeI 24 25 94 132 ND3 HpyCH4IV MseI HaeIII R5 ND4L; ND4 AluI BfaI 26 27 107 130 MseI R6 CYTB; tRNA-Thr; DdeI HaeIII 57 29 118 143* tRNA-Pro MboI MseI RsaI R7 ND5; ND6 DdeI HpaII 58 31 137 174* HaeIII MseI R8 ND1 BfaI DdeI 32 33 88 122 EcoRI MboI R9 COX2; BfaI DdeI 34 35 118 145 Intergenic HpaII spacer; tRNA- HpyCH4IV Lys; ATP6 MboI R10 COX1 BfaI HpaII 36 37 81 109 MboI R11 COX2; BfaI DdeI 38 39 113 136 Intergenic HpyCH4V spacer; tRNA- MboI Lys; ATP6 R12 16S rRNA; ND1 BfaI DdeI 40 43 65  79 MseI

The 12 regions were evaluated informatically by considering the total number of unique sequences in each region out of a database of 968 sequences, 524 of which were obtained from Mitokor, Inc, and 444 of which are human mitochondrial genomes obtained from GenBank. Coordinates are given in terms of the Anderson sequence (SEQ ID NO: 72). The number of unique base count signatures was determined by theoretical digestion of each of the 968 database sequences with the indicated enzymes.

Example 10 Sensitivity Assessed With Quantified Human Blood DNA

To measure sensitivity against total human genomic DNA, a preparation of DNA derived from whole human blood (Seracare blood sample N31774) was obtained using the procedure of Example 1. A stock of blood-derived DNA was quantitated to 1.6+0.06 ng/μl using the average of five independent concentration measurements taken with the Molecular Probes PicoGreen® Assay P-7589. 10-fold serial dilutions of human DNA were tested in PCR reactions according to Example 1 using the primer pairs of SEQ ID NOs: 12:15 (HV1) and SEQ ID NOs: 65:66 (HV2), starting with 1.6 ng/reaction and diluting to extinction (as a set of stock dilutions in double deionized H₂O) down to a calculated concentration of 160 zg/reaction (10 orders of magnitude dilution). No carrier DNA was used in these reactions.

FIG. 9 shows clear PCR product detection down to 1.6 pg/reaction for both HV1 and HV2 primer pairs, with possible stochastic detection of a faint product at 160 fg input template. It is typically estimated that a single human cell has approximately 3.3 billion base pairs −48, or 6.6 billion total bases, which corresponds roughly to approximately 6-7 pg total DNA per cell. This suggests PCR detection of mtDNA targets down to single-cell or sub cellular levels.

After digesting HV1 amplicons with RsaI, and HV2 amplicons with HaeIII, HpaII, HpyCH4IV and HpyCH4V, a full profile was recovered for HV2 with 16 pg input template, and for HV1 with 160 pg input template. Subsequent experiments have demonstrated full profile recovery for HV1 down to at about 50 pg input template concentration with human DNA from the same source. This represents an estimated 8 to 10 cells worth of DNA.

Example 11 Characterization of Mitochondrial DNA from Human Hair and Specificity of HV1 and HV1 Primer Pairs in the Presence of Non-Human DNA

To test our ability to detect mitochondrial DNA from human hair shafts, and the specificity of our control-region primer targets in the presence of non-human mammalian DNA, DNA was extracted from washed human hair shafts (8, 4, 2, 1 and ½ cm), washed hamster, dog, and cat hair (4-6 cm) and washed human (2-3 cm) plus hamster, dog or cat hair (4-6 cm) present together in the same tube, according to the protocol outlined in Example 1. Hairs were taken by cutting with scissors, rather than pulling to avoid including a hair root in the reactions. PCR reactions were carried out using the primer pairs of SEQ ID NOs: 12:15 (HV1) and SEQ ID NOs: 65:66 (HV2) with PCR conditions as outlined in Example 1. Duplicate PCR reactions, demonstrated the presence of a PCR product of the expected size in the presence of human hair-derived DNA, but not in the negative controls (identical reactions, but with double deionized H₂O substituted for template) or with hamster, dog, or cat hair alone.

When these PCR were digested with RsaI (HV1) and HaeIII, HpaII, HpyCH4IV, and HpyCH4V (HV2) as described in Example 2, a profile of base compositions matching Ibis internal blinded sample CS0022 was found for products amplified in the presence of animal hair and for human hair alone down to 2 cm.

Example 12 Characterization of Mitochondrial DNA Isolated Four Non-Invasive Tissues (Cheek Swab, Hair, Fingernail and Saliva) from Three Independent Donors: Analysis for Consistency in Processed Mass Spectrometry Data

In this experiment, DNA was isolated from 3 pooled hairs of ˜2-3 cm length each from 3 donors (designated “F”, “M” and “J”) according to procedures outlined in Example 1. DNA from Several (3-5) pooled small fingernail clippings was isolated from the same three donor according to Example 1 with the exception that there was no sonication step prior to DNA isolation, as this step was added at a later time. DNA from ˜0.5 ml saliva was isolated from the same three donors according to Example 1. These three donors were also part of the 16-donor cheek swab panel described in Example 8, and processed data from cheek swabs representing these donors existed before this experiment and was used for comparison to the three new tissue samplings.

PCR reactions were performed using 1 μl of template from each of the four sample preparations for each of the three donors according to Example 1 using primer pair SEQ ID NOs: 12:15 (HV1) and SEQ ID NOs: 8:9 (HV2). Restriction digestions were performed according to Example 2. To determine a truth base for each sample for this experiment, PCR reactions performed with primer pair SEQ ID NOs: 12:15 (HV1) and SEQ ID NOs: 8:9 (HV2) were purified with a QIAQuick PCR purification kit (according to Qiagen kit recommendations) and sequenced at Retrogen (San Diego).

Digestion results for the original cheek swab-derived products were first compared to the sequences determined for cheek-swab-derived amplicons for consistency. After confirming consistency between the determined sequence and the mass spectrometry derived fragment profile, the ability to qualitatively exclude each of the samples from the other two was evaluated by matching the processed mass data for the cheek swab-derived samples from each of the donors to theoretical digestions from the PCR-derived sequences corresponding to the other two donors.

Processed mass spectrometry data for samples derived from the four different tissue sources were then compared to the cheek-swab-derived sequence for each donor individually and found to be consistent across the four tissue types, with the exception that the HV2 length heteroplasmy observed in HV2 of both sample “M” and sample “J” was observed in only three of the four tissue samples. The length heteroplasmy was not observed in the hair-derived sample for either “M” or “J”.

Example 13 Validation of Mitochondrial DNA Analysis Process on Saliva Samples from 36 Volunteer Donors

In this validation experiment, 1 μl of each of the 36 Ibis samples (CS0001-CS0036) was PCR amplified in duplicate using each of the final primer pairs shown in Table 6 on two different days (four reactions were performed on each sample) using the cycling parameters in indicated in Example 1. FIG. 10 shows one set of the 36 sample PCR reactions for the HV1 region. After PCR, 25 μl of each reaction were digested in 50 μl restriction digestion reactions as described in Example 2. Samples from each of the 12 PCR plates were then subjected to mass spectrometry and processed with the ICR-2LS software to produce monoisotopic neutral masses. Each set of mass data was scanned against the database individually at +4 ppm matching threshold, allowing for the possibility of a 1-dalton error on each mass determination.

One potential issue with the deconvolution from raw mass spectrometry data to exact mass determination is the potential for the algorithm that fits a theoretical isotopic distribution to an observed distribution can occasionally predict the best fit with the distribution shifted by exactly one Dalton to the right or to the left of the true distribution, resulting in a mass determination that is exactly one Dalton off. This is not a serious issue when using mass data to verify consistency with a known sequence, because the expected base composition is known and two independent measurements are made on each double stranded fragment where each strand (top and bottom) is linked to the other in a highly constrained manner because of base complementarity. When using deconvolved numerical masses to make de novo base composition predictions, however, this must be dealt with properly to ensure a proper interpretation of match data. For example, the mass difference between an internal ‘C’ and an internal ‘T’ in a DNA sequence is −14.9997 Daltons. The mass difference between an internal ‘G’ and an internal ‘A’ is 15.9949 Daltons. Because of this, the mass difference between two strands of DNA that differ exactly by C T+G A is 0.9952 Daltons. Likewise, the reverse, T C+A G is a difference of −0.9952 Daltons.

For this reason, all of the matching to the database is performed assuming this as a possibility on every strand. However, when two masses match perfectly to two complementary base compositions at <10 ppm error (we generally use a threshold of 5 ppm or less) both masses would simultaneously require a 1-dalton error, and both would be required to have the error shifted in the same direction, to match a base composition fitting the above scenario. To avoid the rare occurrence of this situation, replicate reactions are required to ensure reproducible results for a profile analysis.

After scanning the database to generate a list of all possible fragment matches for each mass at +4 ppm threshold and allowing a precise +1 Dalton error on every mass, an automatic filter was applied that assumes that a pair of perfect matches to a complementary pair of base compositions overrides a match requiring a 1-dalton shift in the same direction on both strands (as described in the above paragraph). A second filter was applied to completely filter out ambiguous fragments where one mass actually did exhibit a one-Dalton shift error. This is easily spotted in an automated fashion, because two masses will only match a complementary set of base compositions with high precision if one of them is shifted by exactly 1 Dalton under this scenario. This can present ambiguity, however, because there is no de novo way to tell which mass has the error. Replicate reactions are relied upon to resolve this type of ambiguity (alternatively, a profile can be scanned with ambiguity in an “either-or” mode with little or no effect on the actual match result if enough fragments are present in a profile, much like using an ‘R’ to represent ‘A’ or ‘G’, or an ‘N’ to represent any nucleotide).

The last step is to create a composite profile from the combination of pre-filtered matches in each reaction scenario. To do this, all of the unfiltered masses from each of the replicates in each reaction scenario (e.g., one reaction scenario would be HV1 PCR product digested with RsaI) were combined into one data set and used again against the entire database to regenerate a single composite profile. This operation provides the benefit of increasing sensitivity in that a fragment lost in one reaction can be picked up in another, and can help prevent ambiguous base composition assignments. The final step is to filter any ambiguous assignments from the composite profile before comparing profiles or scanning the database with a profile. Even in the very unlikely case that masses representing both strands of a fragment were Dalton-shifted in the same direction, the same fragment in a replicate reaction should disagree, which is the precautionary purpose of the final filtering step.

Table 7 summarizes the results of the database scans using the six-region profiles. It should be noted here that there was considerably more noise in the larger non-control-region spectra than the spectra for the HV1 and HV2 regions. Although it did not detract from the ability to match the proper donor signature, it did produce more than desired ambiguity in data processing. The level of noise in this data set also did not cause a problem in the ability to differentiate samples from each other by at least one SNP, with the exception of samples CS0004, CS0025 and CS0032. Interestingly, one SNP in R1 differentiates CS0004 and CS0025 (which appear to a very common mtDNA type when HV1 and HV2 are matched to the database), which was detected only in the CS0025 profile. Therefore, CS0004 and CS0025 could not be resolved from each other by direct comparison (see next section), CS0004 hits equally to CS0004 and CS0025 in the database scan, and CS0025 appears to differentiate from CS0004 in a database scan (due to the fact that the profile is being compared to the known CS0004 sequence in the latter case, rather than the experimentally determined base composition profile that has a missing fragment). Two incorrect base compositions were predicted in CS0018 that were corrected by analysis of a duplicate set of restriction digestions. One incorrect base composition was predicted in each of samples CS0006, CS0011 and CS0026, each of which was likewise corrected by analysis of a duplicate set of restriction digestions. This did not change the top database hit (Table 7), nor does it change the ability of CS0001-CS0036 to be differentiated from CS0018, CS0006, CS0011, or CS0026.

TABLE 7 Overview of Validation Results % NO. OF NO. OF MATCHED SECOND BEST % NO. OF ID WITH BEST DATABASE FRAGMENTS MATCHING REFERENCE FRAGMENTS SECOND BEST SAMPLE MATCH MATCH HIGHEST % POSITIONS MATCHED FRAGMENTS MATCHED CS0001 CS0001 100 1 2942 90 2 CS0002 CS0002 100 1 3356 95.3 2 CS0003 CS0003 100 1 3294 90.7 2 CS0004 CS0004 100 6 2879 97.3 14 CS0025 CS0032 gi|17985669 gi|13272808 gi|7985543 CS0005 CS0005 100 1 3190 95.1 12 CS0006 CS0006 97.5 1 3088 92.5 2 CS0006 CS0006 100 1 3198 95 2 Re-anal. CS0007 CS0007 100 1 2940 87.2 11 CS0008 CS0008 100 1 3251 95.2 2 CS0009 CS0009 100 1 2617 89.2 6 CS0010 CS0010 100 2 3205 97.6 8 gi|32692659 CS0011 CS0011 97.7 1 3086 90.7 5 CS0011 CS0011 100 1 3028 92.7 5 Re-anal. CS0012 CS0012 100 1 3193 92.7 2 CS0013 CS0013 100 1 3016 87.5 4 CS0014 CS0014 100 1 3017 92.5 1 CS0015 CS0015 100 1 3378 95.3 1 CS0016 CS0016 100 1 2915 94.7 1 CS0017 CS0017 100 1 3229 92.9 3 CS0018 CS0018 94.9 1 2629 89.7 1 CS0018 CS0018 100 1 2691 94.4 1 Re-anal. CS0019 CS0019 100 1 2794 92.3 1 CS0020 CS0020 100 1 3231 92.9 8 CS0021 CS0021 100 1 2902 97.5 3 CS0022 CS0022 100 1 3314 95.3 10 CS0023 CS0023 100 1 2953 84.6 3 CS0024 CS0024 100 1 3224 87.8 1 CS0025 CS0025 100 3 3080 97.6 11 gi|3272808 gi|7985669 CS0026 CS0026 97.6 1 2787 90.2 1 CS0026 CS0026 100 2787 92.5 1 Re-anal. CS0027 CS0027 100 1 2940 94.9 4 CS0028 CS0028 100 1 2975 97.5 8 CS0029 CS0029 100 1 3002 92.7 4 CS0030 CS0030 100 1 3066 97.6 1 CS0031 CS0031 100 1 3409 86.4 7 CS0032 CS0032 100 2 3288 97.6 3 gi|17985543 CS0033 CS0033 100 1 3098 92.7 2 CS0034 CS0034 100 1 3100 85.4 2 CS0035 CS0035 100 3 2971 97.5 3 gi|32892351 gi|32892449 CS0036 CS0036 100 1 2703 91.9 3

Various modifications of the invention, in addition to those described herein, will be apparent to those skilled in the art from the foregoing description. Such modifications are also intended to fall within the scope of the appended claims. Each reference cited in the present application is incorporated herein by reference in its entirety. 

What is claimed is:
 1. A method of forensic analysis of a sample comprising mitochondrial DNA comprising: selecting a region of mitochondrial DNA comprising at least one restriction site whereat a restriction enzyme cleaves said mitochondrial DNA to produce a plurality of restriction fragments; populating a relational database of known mitochondrial DNA sequences with entries which correspond to theoretical restriction fragments obtained from theoretical digestion of each member of said database at said at least one restriction site; selecting a primer pair with which to amplify said region of mitochondrial DNA in said sample; amplifying said region of mitochondrial DNA in said sample to produce an amplification product; digesting said amplification product with at least one restriction enzyme to produce a plurality of restriction fragments; experimentally testing each member of said plurality of restriction fragments; and comparing said experimentally tested members with said theoretical digestion of each member of said database, wherein at least one match or lack of a match provides a forensic conclusion.
 2. The method of claim 1 wherein said region of mitochondrial DNA comprises HV1.
 3. The method of claim 2 wherein each member of said primer pair has at least 70% sequence identity with the sequence of the corresponding member of any one of the following primer pair sequences: SEQ ID NOs: 12:13, 12:14, 12:15, 16: 17, 42:43, 42:46, 67:68, 69:70, 12:68, 12:70, 67:15, 71:70, 69:15 and 69:68.
 4. The method of claim 1 further comprising: populating said relational database of known mitochondrial DNA sequences with base compositions which correspond to theoretical restriction fragments obtained from theoretical digestion of each member of said database at said at least one restriction site; experimentally determining the base composition of each member of said plurality of restriction fragments from said experimentally determined molecular masses of each member of said plurality of restriction fragments; comparing said experimentally determined base compositions with the base compositions of said theoretical digestion of each member of said database wherein at least one match or lack of a match provides a forensic conclusion.
 5. The method of claim 1 wherein said amplifying step comprises polymerase chain reaction.
 6. The method of claim 5 wherein said polymerase chain reaction is catalyzed by a polymerase enzyme whose function is modified relative to a native polymerase.
 7. The method of claim 6 wherein said modified polymerase enzyme is exo(−) Pfu polymerase.
 8. The method of claim 6 wherein said modified polymerase catalyzes the addition of nucleotide residues to staggered restriction digest products to convert said staggered digest products to blunt-ended digest products.
 9. The method of claim 1 wherein said amplifying step comprises ligase chain reaction or strand displacement amplification.
 10. The method of claim 1 wherein said database is a human mtDNA population database.
 11. The method of claim 1 wherein said testing comprises mass determination by mass spectrometry.
 12. The method of claim 11 wherein said mass spectrometry is ESI-TOF mass spectrometry.
 13. The method of claim 1 further comprising repeating all steps of the method for at least one additional region of mitochondrial DNA.
 14. The method of claim 1 wherein said mitochondrial DNA is human mitochondrial DNA.
 15. The method of claim 1 wherein said mitochondrial DNA is animal mitochondrial DNA.
 16. The method of claim 1 wherein said mitochondrial DNA is fungal, parasitic, or protozoan DNA.
 17. The method of claim 1 wherein said amplified DNA is digested directly without purification.
 18. The method of claim 1 wherein said sample of mitochondrial DNA is obtained from saliva, hair, blood, or nail.
 19. The method of claim 1 wherein said plurality of restriction fragments are up to about 150 base pairs in length. 